Robust supervised topic models under label noise
نویسندگان
چکیده
Recently, some statistical topic modeling approaches have been widely applied in the field of supervised document classification. However, there are few researches on these under label noise, which exists real-world applications. For example, many large-scale datasets collected from websites or annotated by varying quality human-workers, and then a mislabeled items. In this paper, we propose two robust models for classification problems: Smoothed Labeled LDA (SL-LDA) Adaptive (AL-LDA). SL-LDA is an extension (L-LDA), classical model. The proposed model overcomes shortcoming L-LDA, i.e., overfitting noisy labels, through Dirichlet smoothing. AL-LDA iterative optimization framework based SL-LDA. At each procedure, update prior, incorporates observed concise algorithm maximizing entropy minimizing cross-entropy principles. This method avoids identifying label, common difficulty existing noise cleaning algorithms. Quantitative experimental results completely at random (NCAR) Multiple Noisy Sources (MNS) settings demonstrate our outstanding performance labels. Specially, has significant advantages relative to state-of-the-art massive noise.
منابع مشابه
Supervised Topic Models
We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict respons...
متن کاملRobust Loss Functions under Label Noise for Deep Neural Networks
In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate app...
متن کاملRobust Semi-Supervised Learning through Label Aggregation
Semi-supervised learning is proposed to exploit both labeled and unlabeled data. However, as the scale of data in real world applications increases significantly, conventional semisupervised algorithms usually lead to massive computational cost and cannot be applied to large scale datasets. In addition, label noise is usually present in the practical applications due to human annotation, which ...
متن کاملAn Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کاملSupervised dimension reduction with topic models
We consider supervised dimension reduction (SDR) for problems with discrete variables. Existing methods are computationally expensive, and often do not take the local structure of data into consideration when searching for a low-dimensional space. In this paper, we propose a novel framework for SDR which is (1) general and flexible so that it can be easily adapted to various unsupervised topic ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2021
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-021-05967-y